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ABSTRACT 

We present measurements of the mean mid-infrared-to-submillimctcr flux densities 
of massive (M+ > 10 11 Mg) galaxies at redshifts 1.7 < z < 2.9, obtained by stacking 
positions of known objects taken from the GOODS NICMOS Survey (GNS) catalog on 
maps at: 24 urn (Spitzer /MIPS); 70, 100, and 160 urn (Herschel /P ACS); 250, 350, and 
500 pm (BLAST); and 870 urn (LABOCA). A modified blackbody spectrum fit to the 
stacked flux densities indicates a median [interquartile] star-formation rate of SFR = 
63 [48, 81] Mq yr _1 . We note that not properly accounting for correlations between 
bands when fitting stacked data can significantly bias the result. The galaxies are 
divided into two groups, disk-like and spheroid-like, according to their Sersic indices, 
n. We find evidence that most of the star formation is occurring in n ^ 2 (disk-like) 
galaxies, with median [interquartile] SFR = 122 [100, 150] Mq yr _1 , while there are 
indications that the n > 2 (spheroid- like) population may be forming stars at a median 
[interquartile] SFR = 14 [9, 20] Mq yr _1 , if at all. Finally, we show that star formation 
is a plausible mechanism for size evolution in this population as a whole, but find only 
marginal evidence that it is what drives the expansion of the spheroid- like galaxies. 

Key words: galaxies: evolution — galaxies: high-redshift — infrared: galaxies. 



1 INTRODUCTION 

The observed structural properties of massive galaxies 
(M* > IO^Mq) at high redshift (z > 1) are difficult 
to reconcile with those of galaxies that populate the lo- 
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cal Universe. Most strikingly, they are on average much 
more compact in s ize than local galax ies of similar mass 
l|Daddi et all 120051 : iTruiillo et ai]|2006h . For the spheroid- 
like galaxy population, this size evolution has been par- 
ticularly dramatic ( a factor of 4-5 since z ~ 2, see e.g., 
Truiillo et ail 120071 : IBuitrago et alj 120081 ; IDamianov et al.l 
2009), with subsequent observations confirming the se find- 
ings (e.g., iMuzzin et all 120091 ; ITruiillo et all 1201 ll ). Only 
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a tiny fraction of massive galaxies in the local Universe 
have sizes compara ble to those found at high redshift 
l)Truiillo et al.l [2009). The absen ce of similar mass coun- 
terparts in the local Universe (|Truiillo et al.l 120091 ) im- 
plies that some mechanism is acting o n those high-redshift 
galaxies to make the m grow in size (Hopk ins et al ] 120091 : 
iBezanson et al.ll2009T ). 

In order to understand the mechanism responsible for 
this galaxy growth, a crucial point that needs to be ad- 
dressed is the level of star formation (or star-formation 
rate: SFR) in this population. From an observational point 
of view, evidence for star formation in massive galaxies at 
high redshift is unclear, especially for the spheroid-like pop- 
ulation. For example, small sam ples of high-quality spec- 
troscopy (|Kriek et al.ll2006l , 120091 ) find little or no star forma- 
tion in this population; whereas, abou t 50% of these galaxies 
appe ar to have 24 \xra counterparts |Perez-Gonzalez et al.l 
l2008f ) . indicating an elevated level of star formation. This 
discrepancy may be due to biases inherent to their respec- 
tive SFR estimators, which are either susceptible to errors 
in extinction correction and require deep spectroscopic ob- 
servations; or probe emission from polycyclic aromatic hy- 
drocarbons (PAHs), and thus provide a poor constraint on 
the thermal spectral energy distribution (SED). 

An alternative probe of star formation is to observe in 
the far-infrared/submillimeter bands (FIR/submm), where 
emission is primarily from heated dust. It is known that 
in the local Universe the dust luminosit y in star-forming 
regions is correlated with SFR (e.g ., iKennicuttl Il998l ; 
ICharv fc Elbazl l200ll ; iBuat" et alj 120070 . with the most ac- 
tively star-forming galaxies often the m ost dust obscured o r 
even optically thick in the optical/UV (Ge nzel et al.|[l99^ ). 
Therefore, it is reasonable to expect that if high-redshift, 
compact, massive galaxies are vigorously forming stars, then 
they should be observable in the rest-frame FIR/submm. 

However, due to the large beams of current submm tele- 
scopes, source confusion and flux boosting present signifi- 
cant obstacles to studying the star formation properties of 
anyt hing other than the m ost luminous galaxies at high red- 
shift IjMoncelsi et al.ll2011h . For example, the 1-a noise floor 
due to c onfusion in the 250 urn band of //er,sc/ie//SPIRE is 
5.8 mjy l|Nguven et al.ll201ol '). which corresponds to the flux 
from galaxies at z ~ 2 with bolometric FIR luminosities 
of Lfir ~2x 10 12 Lq, i.e., ultra- luminous infrared galaxies 
(ULIRGs). As a result, a catalog of galaxies at 2 > 2 robustly 
detected above the confusion noise (5-cr) in the submm can 
only probe the bright end of the luminosity distribution. 
Stacking provides a mechanism to examine the full distribu- 
tion, provided a r eliable external catalog extending to fain t 
fluxes is available (jMarsden et al. 2009; Pascale et al.l 2009). 

In this work we perform a stacking analysis using a 
catalog of distant m assive galaxies from t he GOODS NIC- 
MOS Survey (GNS; IConselice et al.ll201lh — which we se- 
lect to have stellar masses M* ^ lO n M a nd redshifts 
1.7 < z < 2.9 — on maps from: S pitzer /MIPS ilRieke et all 
l2004h at 24 um; Herschel/PACS l|Poglitsch et all 120101 ) at 
70, 100, and 160 [im; the Balloo n-borne Large Apert ure Sub- 
millimeter Telescope (BLAST; iDevlin et all I2009T ) at 250, 
350, and 50 urn; and the La rge APEX Bolometer Camera 
(LABOCA: IWeifi et al.ll2009l) at 870 urn. Our objective is to 
estimate the average SFRs of high-redshift massive galax- 
ies, and to look for differences between the disk-like and 




Figure 1. GNS catalog positions (white circles, 36" in diame- 
ter, solid are n ^ 2; dotted are n > 2) overlaid on a 20' X 20' 
region of the BLAST 250 urn map in GOODS-South. The over- 
lapping Herschel/PACS region is outlined as a da shed box. The 
map has been convolved with a matched-filter (see lChapin et all 
l201lf) to help enhance the regions of submm emission. Most of 
the sources in our catalog lie along regions of faint emission. Note 
that the BLAST beam is many (~ 18-30) times larger than a 
resolved galaxy, necessitating the stack. Furthermore, since the 
angular resolution of Herschel /SPIKE images will only improve 
by a factor of two, stacking will still be required to understand 
the FIR/submm properties of the faint population. 



spheroid-like galaxies. An alternative approach, based on 
counterpart ide ntification of simil ar GNS catalog sources, 
is carried out by I Cava et al.l (|2010h . 

When required, we adopt the concordance model, a flat 
ACDM cosmology with Cl M = 0.2 74, Q A = 0.726, Hp = 
70.5 km s^Mpc -1 , and cr 8 = 0.81 jHinshaw et al.ll2009n . 



2 DATA 

We perform our analysis on the Great Observatories Origins 
Deep Survey South field (GOODS-South), also known as 
the Extended Chandra Deep Field South (E-CDFS), which 
has field center coordinates 3 h 32 m 30 s , -27°48'20". Here we 
briefly describe the catalog and maps. 



2.1 Mass-Selected Catalog 



Our catalog is the lBuitrago et all l|200ct ) su bset of the pub- 
licly available GOODS NICMOS SurvejQ (|Conselice et all 
2011). Here we summa rize its main f e ature s; for a more 
detail ed des cription see Buitrago et al.l l|2008l L iBluck et al.l 
l|2009h and IConselice et al.l (|201ll ). For details concerning 



1 http : //www.nottingham. ac . uk/astronomy/gns/index .html 
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the data reduction procedure see iMagee et all (|2007h . The 
GNS is a large HST NICMOS-3 camera program of 60 H- 
band pointings (180 orbits), with limiting magnitudes of 
H ~ 26.8 (5-cr), optimized to collect data for as many mas- 
sive {Mi, > 10 1X M©) gal axies as possible at high redshift 
(1.7 < z < 2.9), making it the largest sample of such galax- 
ies to date. Of these, 36 are in the southern field for which 
we have infrared and submm maps. 

Redshifts and stellar masses of these objects are cal- 
culated using the BVRIizJHK filters. Photo metric redshifts 
are f ound using standard techniques (e.g., IConselice et al.l 
l2007h . while spectroscopi c redshifts for 7 objects are com- 
piled from the literature dWuvts et al.ll2008l ; iPopesso et all 
|2009| ; iBalestra et aill20irj) . Stellar masses of these objects 
are estimated by fitting the multi-color photometry to model 
SEDs — produced with stellar population syn thesis models 
— re sulting in uncertainties of ~ 0.2 dex (e.g.. lBundv et all 

l2ooeh . 

Additionally, due to the excellent depth and resolu- 
tion of the NICMOS images (pixel scale after resampling 
of 0."l pixel -1 , and a point spread function [PSF] of 0."3 
full width half m aximum [FWHM]), we are able to estimate 
the Sersic (196 jj) indices and s izes of the objects using the 
GALFIT code l|Peng et al.l [20021 ). Average properties of the 
sources used in our analysis are listed in Table [1] 

The selection for the GNS galaxies is based on mass 
and redshift, with 1.7 < z < 2.8. These galaxies were 
located initial ly through c olor selection techniques , such 
as the B zK jDaddi et alj l2007f) . ERO |Yan et all I2004T ) 
and DRG iPapovich et al.l |2006l ) criteria, and later refined 
through spectroscop ic and photome t ric red shifts within the 
two GOODS fields. IConselice et al.l (|201ll ) perform several 
tests to ensure that the sample is complete. A possible bias 
might be that extremely dusty galaxies could be missed by 
this criteria due to attenuation, but the deep limiting H- 
band magnitude greatly exceeds that of the expected up per 
bound for dusty SMGs (~ 23.3 mag. iFraver et al.ll2004T ). so 
that we are confident that we are not missing the dustiest 
galaxies due to attenuation. Lastly, it is expected that this 
selection of galaxies closely approximates the true ratio of 
red to blue galaxies in these mass and redshift ranges. For 
more detai ls concerning the selec tion technique and possible 
biases (see IConselice et al.ll201lT ). 



2.2 Spitzer 

We use the publicly available Spitzer /MIPS map at 24 urn 
from the Far Infrared Deep Extragalactic Legacy Survey 
(FIDEL 0). The 5-cr point source sensitivity of this map 
is 0.03 mjy. 



2.3 PACS 



We u se publicly available Herschel /PACS (Pog htsch et al.l 
|2010D observations of the GOO DS-South field f rom the 
PACS Evolutionary Prob«0 (PEP; iLutz et al.ll201ll ) survey. 
The data was re-pr ocessed wi th the Herschel Processing En- 
vironment (HIPE; [Ott] l20ld . continuous integration build 



number 6.0.2110). PEP was designed to provide data in all 
three PACS bands. Since PACS can only observe in two 
bands simultaneously — at 160 vim (red) and either 70 (blue) 
or 100 u.m (green) — we use two sets of observations to pro- 
duce maps at all three wavelengths. We combine the avail- 
able deep observations using the standard PACS pipeline, 
choosing a high-pass filter parameter of 20 for the blue and 
green bands, and 30 for the red band (corresponding to sup- 
pression of scales larger than 40 and 60" on the sky, respec- 
tively; see iMiiller et al.ll201lf) . In order to prevent ringing 
effects around bright sources caused by the high-pass filter 
the pipeline performs an initial crude reduction and auto- 
matically masks out the brightest sources in the subsequent 
iterations of de-glitching and filtering. The r.m.s. depths of 
the final maps are 0.31, 0.44, and 1.5 mjy at 70, 100, and 

160 \im, respectively. 

As reported by IMiiller et~aH l|201ll ). the relatively 
strong high-pass filter adopted along with the masking of 
the bright sources may attenuate the final photometry of 
faint sources. To account for these effects, we produce maps 
of a few, isolated, unmasked, faint point sources of differ- 
ent flux density, using the same parameters as were used in 
the reduction of the GOODS-South maps; we then mask 
these sources and create new maps. We use the average 
ratio of the flux densities of the same sources in the two 
maps as our estimate of the attenuation factor due to the 
high-pass filter. We find that the magnitude of the attenua- 
tion mildly increases for increasing wavelengths, as expected 
given the shape of the 1/f noise o ver the relevant frequency 
range (<x r a5 ; lLutz et al.l [2oTll ') . The estimated attenua- 
tion factors are 0.80, 0.78, and 0.75 at 70, 100, and 160 um, 
respectiv ely. Note th a t a sli ghtly different approach was fol- 
lowed by ILutz et al.l (|201ll ). who perform tests on the red 
band by adding simulated sources to the timelines before 
masking and high-pass filtering; they find that the filtering 
modifies the fluxes by 16% for very faint unmasked point 
sources. Despite the slight disagreement with our finding at 
160 um, and because of the lack of an estimate for the blue 
and green bands from the PEP team, we choose to adopt 
our three estimated factors for consistency. 



2.4 BLAST 

The BLAST maps in GOODS-Soutffl consist of a deep 
region covering ~ 0.9 deg 2 which complete l y enco mpasses 
the southern sources in the iBuitrago et al.l (|2008l ) catalog 
(Figure [TJ, and have r.m.s. depths o f 11, 9, and 6mJy , 
at 250, 350, and 500 um, respectively iDevlin et al.ll2009l ). 
Due to large instrumental beams (36, 42, and 60") and 
steep source counts (ap proximately following dN/dS tx S -3 ; 
IPatanchon et all 120091 ) . source confusion contributes sub- 
stantially to the noise in these maps, and is estimated 
to be c con f usion ~ 21 , 17, and 15 mjy in the three bands 
|Marsden et all 120091 '). The BLAST maps were made us- 
ing a naive mapmaker (|Pascale et al.ll201ll). Further details 
on the instrument may be found in Pascale et alj ll 20081) . 
while flight perfor mance and calibration are provided in 
iTruch et all i|2009h . 



2 http : //data. spitzer . caltech. edu/popular/f idel/20070917_enhanced/docs/f idel_dr2 .html/ 

3 http://www.mpe.mpg.de/ir/Research/PEP/ 4 Available at: http: //blastexperiment . info/results .php 
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-^median 


z i q r 


M* 
(M ) 


Re 

(kpc) 


n 


T 

(K) 




^FIR 

(L©) 


SFR 

(MQyr- 1 ) 


All 

n sC 2 
n > 2 


36 
20 
16 


2.285 
2.285 
2.270 


1.980-2.500 
2.085-2.500 
1.865-2.625 


1.85 x 10 11 
1.93 x 10 11 
1.74 x 10 11 


2.00 
2.43 
1.49 


2.03 
1.05 
3.25 


29.4^0 8 [27.3,31.6] 
32.6±o.4 [30.8,34.6] 
27.6l°-j [24.2,30.8] 


12.0±^ 


[4.7,8.0] x 10 11 
[9.8,14.8] x 10 11 
[0.9,2.0] x 10 11 


63±\{ [48, 81] 
122±J| [100, 150] 
14^8 [9, 20] 



Table 1. Average properties of stacked samples. R c is the effective radius. Dust temperatures, bolometric FIR luminosities and SFRs, 
corrected to a Chabrier 1(20031) IMF, are shown with the corresponding upper and lower Gaussian uncertainties, and interquartile ranges 
in square brackets (see Section [4] for details). 



2.5 LABOCA 



The LABOCA E-CDFS Submm Survey (LESS: IWeifi et all 
2009) provides deep 870 u.m data, with an r.m.s. depth to 
better than 1.2 mjy across the full 30' x 30' field, with an 
effective resolution of 27" FWHM. For a d etailed description 
of the instrument see lSiringo et al. | (|2009h . 



3 METHOD 



3.1 Stacking Formalism 



Stacking is a well established technique for finding the av- 
erage properties of objects which individually are unde- 
tectable b y using external knowledge of their positions in a 
map ( e.g.jDole et alj2006l; IWang et al]|2006l ; iMarsden et"all 



20091; iPascale et all l2009l ). We follow the formalism of 
Marsden et al.l (J2009J, hereafter M09), to which we refer to 
for a full description of the stacking method. Here we sum- 
marize the salient features of the technique. 

M09 showed that the mean flux density of an exter- 
nal catalog is simply the covariance of the mean-subtracted 
map with the catalog, divided by the variance of the catalog 
density. If the catalog is Poisson-distributed, then a pow- 
erful diagnostic is that the variance of the source density 
should equal the mean, and the average flux density can be 
re-written as the mean map value at the position of each 
catalog source. This is true no matter what the size of the 
beam or surface density of sources in the map, so long as the 
sources are uncorrelated at the scale of the beam. The algo- 
rithm has been extensively tested with Monte Carlo simula- 
tions on mock random maps with increasing source densities, 
and was shown consistently to recover the correct mean flux 
density, with no dependence on the number of sources per 
beam (Figure [2|. If however the catalog is clustered on the 
beam scale, the stacked flux will be biased high, compared 
to the properly normalized covariance, by a factor equal to 
the catalog variance at the beam scale divided by the mean 
source density. In the following section we show that this 
factor is consistent with unity for our data. 

Uncertainties and possible biases of our measurement 
are estimated by generating random catalogs and stacking 
them on the actual maps themselves. We find that the uncer- 
tainties are Gaussian-distributed and scale as the map r.m.s. 
(including confusion noise) divided by the square root of the 
number of catalog entries. Note that these uncertainties ac- 
count for both instrumental and source confusion noise, as 
well as for any pixel-pixel correlations introduced by the 
map-making algorithm (e.g., the "drizzling" technique used 
to produce the PACS maps with the standard pipeline). 




1.10 



Figure 2. Histograms showing the ratio of recovered stacked 
fluxes to true flux for 10,000 simulations. The stacks were per- 
formed on simulated 0.25 deg 2 maps based on a random catalog 
of 12,500 sources, with size and source densities typical for deep 
24 |am MIPS catalogs. We have repeated the test for six beam 
sizes in the range 10—60", which probe the effects of stacking at 
source densities ranging from 0.4 to 16 sources per beam. As de- 
scribed in Section 13.11 and in M09, larger beams lead to larger 
uncertainties, but in all cases, the stacked values are consistent 
with the true catalog flux, showing that there is no bias when 
stacking on uncorrelated catalogs. 



3.2 Testing the Poisson Hypothesis 

Stacking provides an unbiased estimate of the mean flux 
only when the sources in the sky are uncorrelated. While 
massi ve galaxies have bee n shown to cluster quite strongly 
(e.g., iFoucaud et ai1l201Qt ), we find that on scales relevant 
for this analysis they are essentially Poisson-distributed, as 
we show with the following tests: 

1) In the presence of clustering, the FWHM of the 
postage-stamp of stacked sources would be larger than 
the nominal instrumental PSF. We compare our measured 
stacked 24 [im PSF to that measured from stacking the 
sources used in M09 dMagnelli et al.l 120091 ). which were 
shown to be Poisson distributed (see Figure 3 of M09), and 
find that they are identical. 

2) If the sources are Poisson-distributed over a given 
scale, then by definition the average number of sources in a 
cell of that size should equal the variance. We test that by 
dividing the field into equal sized cells, from 2.7 to 0.225' on 
a side, and find that the ratio of the variance to the mean 
is consistent with unity at all scales. 
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3) In the presence of strong clustering around massive 
galaxies we would expect to find more sources per beam sur- 
rounding the galaxies than would be found at random. We 
calculate the number of sources inside a BLAST beam radius 
at the locations of each massive galaxy and compare that to 
what we would expect at random. From 1,000 Monte Carlo 
simulations we find 1.10 ± 0.13, 1.16 ± 0.17, and 1.28 ± 0.21 
sources per beam at 250, 350, and 500 pm, compared to 
the measured 1.04, 1.13, and 1.17, respectively. We extend 
this test to galaxies with log(M*/MQ) > 9, to account for 
the possibility of less massive galaxies clustering around our 
more massive ones. We find there are 2.85±0.40, 3.83±0.51, 
and 5.97 ± 0.73 sources per beam at 250, 350, and 500 urn, 
compared to the measured 2.53, 4.04, and 5.87, respectively. 
Thus, while there are multiple sources per beam at all wave- 
lengths, because their distribution is consistent with Pois- 
son, they do not bias the result. 

There still remains the possibility, however, that even 
fainter (< 13 pJy at 24 pm), undetected sources cluster 
around detected ones. We can estimate their potential con- 
tribution in the following way. If clustered, faint sources 
contribute significantly to the stacked flux density for large 
beams then after convolving the 24 pm map (whose beam is 
6") with a much larger beam we would expect the stacked 
flux density to increase. On the other hand, as described 
in the previous section, if the faint sources are Poisson- 
distributed then we would expect only the noise to increase. 
We find that after convolving the 24 urn map with a 60" 
beam, the stacked flux density per source is 0.08±0.11 mjy, 
compared to the original 0.081 ± 0.005 mjy (see Table [3}. 
Thus, the stacked signal does not change, but the errors in- 
crease substantially, which is consistent with what we would 
expect from additional, Poisson-distributed sources in the 
beam. We therefore conclude that the contribution from 
faint clustered sources is negligible. 



3.3 SED Fitting, IR Luminosities, and 
Star-Formation Rates 

We model the thermal dust emission as a modified black 
body with an SED of the form; 

S v =Av B(y,T), 



Band 



(1) 



where B(y,T) is the blackbody spectrum with amplitude 
A, and f3 is the emissivity index, which effectively takes into 
account the varia bility of dust tempe ratures within a single 
galaxy. Following iBlain et all (120031 1 we set /3 to 1.5. Ad- 
ditionally, we replace the mid-infrared exponential on the 
Wien side of the spectrum wit h a power-law of the form 
f v oc v~ a (with a — 2, following IBlain et al.ll2003t ). which in 
practice means imposing that the two functions and their 
first derivatives be equal at the transition frequency. In 
our case that transition occurs at rest-frame ~ 74 pm (for 
T ~ 30K; see §Kty. 

Our SED fitting procedure estimates the amplitude and 
temperature of the above template, keeping a and ft fixed. 
For the BL AST points, the SE D fitting procedure (described 
in detail in lChapin et alj|2008h takes the width and shape 
of the photometric bands into account, as well as the abso- 
lute photometric calibration uncertainty in each band (see 
iTruch et al]|2009h . Correlations due to instrumental noise 
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Table 2. Pearson correlation matrix for all bands. 



are estimated and accounted for with a Monte Carlo pro- 
cedure. Because we do not possess similar detailed data for 
Spitzer/MIPS and LABOCA, these photometric points are 
not color-corrected, whereas we do apply a color-correction 
to the P ACS points, followin g the standard procedure de- 
scribed in lMiiller et alj (|201ll . see their Table 4.2, for a power 
law v~ ); the color-correction factors are 1.016, 1.012, 1.017 
at 70, 100, and 160 um, respectively, and have a negligible 
impact on the final results. The PACS points are assumed 
to have completely uncorrelated instrumental noise among 
bands. 

Correlated confusion noise must also be accounted for in 
the fit, as these correlations reduce the significance of a com- 
bination of single band detections. We estimate the Pearson 
coefficients of the correlation matrix for all bands (see Ta- 
ble [2J from the beam-convolved maps within a region of 
0.064 deg 2 that encompasses all the sources in the GOODS- 
South NICMOS catalog. In §[L3]we will show how the effect 
of correlations between bands is quite signifi cant, especially 
amon g PACS and BLAST bands (see also iMoncelsi et al.l 
120111 ). and thus including them in the SED fitting algorithm 
is crucial. 

SEDs are corrected for redshift by assuming the median 
redshift for each subset (see column 3, Table[l}. Interquartile 
errors reflecting the uncertainty in dimming due to the width 
of the redshift bin are estimated with a Monte Carlo, where 
1000 mock redshifts with the same distribution as the chosen 
subset (i.e., all, disk-like, and spheroid-like) are drawn, and 
the dimming factor for each redshift is calculated. 

The resulting infrared luminosity, Lfir, is convention- 
ally the integral of the rest-frame SED between 8 and 
1000 pm, and the SFR is estimated as 

SFR [MQyr -1 ] = 1.728 x 10" 10 x L Fm [L ], (2) 

from iKennicuttJ (|l998T ). which assumes a Salpeter |l955|) ini - 
tial mass function (IMF). To convert to a Chabrier (|2003T ) 
IMF, log(SFR) must be corrected by lower ing 0.23 dex (e.g., 
iKriek et aill2009l ; Iron Dokkum et alj|20ich . 



4 RESULTS 

4.1 Stacking Results 

Stacked flux densities and l-cr uncertainties are reported in 
the second column of Table [3] We find statistically signifi- 
cant, non-zero signals in all the submm bands, with 2-, 3-, 
3-, and 4-<r detections at 250, 350, 500, and 870 pm, respec- 
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Band 

(H 


All 

(mjy/source) 


n ^ 2 (disk-like) 
(mjy/sourco) 


n > 2 (spheroid-like) 
(mjy /source) 


24 


0.081 ± 0.005 


0.130 ±0.007 


0.020 ± 0.007 


70 


0.16 ± 0.07 


0.36 ±0.09 


-0.05 ±0.10 


100 


0.39 ± 0.09 


0.84 ±0.13 


-0.17 ±0.14 


160 


1.2 ± 0.3 


2.9 ±0.5 


-0.66 ±0.50 


250 


5.0 ±2.9 


9.3 ±3.9 


-0.3 ±4.4 


350 


7.9 ± 2.3 


10.7 ±3.1 


4.5 ±3.5 


500 


5.3 ± 1.9 


6.2 ± 2.6 


4.2 ±2.9 


870 


0.97 ±0.26 


1.03 ±0.35 


0.9 ±0.4 



Table 3. The mean flux densities of massive galaxies in the GNS 
catalog from stacking. Reported are the results for all of the 
sources, as well as those identified as disk-like and spheroid-like, 
based on their Sersic indices, n. 



tively, as well as robust 16-, 3-, 4-, 4-cr detections at 24, 70, 
100, 160 [im, respectively. 

Next, we divide the catalog by Sersic index into: those 
with n > 2, which are spheroid- like and thus more likely 
to have suppressed star formation; and those with n ^ 2, 
which are disk-like and thus more lik ely to be actively form- 
ing stars (|Ravindranath et al.ll2004l ). Contamination of one 
population into the other has been shown with simulations 
to be very low (< 10%; iBuitrago et"aH l201ll ). but when 
galaxies do cross the n — 2 threshold, it is always from 
n < 2 to n > 2, i.e., from spheroid-like to disk-like. 

The results are listed in the third and fourth columns 
of Table [3] At 24 u.m, we measure a distinct signal from 
both populations, with 19-a and 3-<r detections from the 
disk-like and spheroid-like sources, respectively. At longer 
wavelengths, for the disk-like population we detect signals 
with greater significance than that of the combined catalog, 
between 2.5-a and 6.5-er, in each FIR/submm band; whereas 
for the spheroid-like population we find a much weaker sig- 
nal, with four bands consistent with zero. 

While the error on the stacks is Gaussian, the uncer- 
tainty associated with the average rest-frame Lfir is dom- 
inated by the width of the redshift distribution, which is 
not Gaussian. Thus, for estimating T, Lfir, and SFR (Sec- 
tion 14. 3[) . we choose to adopt the median value and in- 
terquartile range, as they best reflect the asymmetric shape 
of the redshift distribution, which ultimately determines the 
uncertainty of our measurement. For reference we also quote 
the Gaussian uncertainties. We anticipate that the lower 
Gaussian errors on T, Lfir, and SFR for the spheroid-like 
subset exceed the lower bound of the interquartile range, and 
reflect the elevated level of uncertainty in our measurement. 



4.2 Contribution of stellar emission 

At z ~ 2.3 the observed 24 [im band probes rest-frame wave- 
lengths of 6-7 urn, which in addition to PAH emission, is 
where the Rayleigh- Jeans tail of stellar emission lies. It is 
then plausible that the emission we find in this band may 
be entirely attributed to stellar emission. Since our detec- 
tion of PAH and dust emission, particularly for the spheroid 
population, is supported strongly by this data point (given 
its high signal-to- noise) , in this section, we investigate the 
contribution of pure stellar emission to the observed 24 [im 
band. 



Any additional emission not attributed to stellar emis- 
sion is likely associated with PAH emission, which in turn 
is accompanied by longer wavelength dust emission that we 
have inferred our SFRs from. We note that this emission 
may be associated with either star forming regions or evolv- 
ing main sequence stars (such as AGB and TP-AGB stars). 
Typically, emission associated with star formation domi- 
nates in most galaxies (even those with moderate SFRs) 
as the infrared light-to-mass ratio is up to three orders of 
magnitude larger for a simple stellar population (SSP) o f 
10 7 yrs compared to an SSP of 10 9 yrs (|Piovan et al.ll2006l ). 
where TP-AGB emission would be most significant. 

To investigate the contribution of pure stellar emission 
in our sample, we calculate the predicted 24 u.m observed 
flux densities from stellar population synthesis models us- 
ing redshifts and stellar masses as per our catalog (see Sec- 
tion [27T]) . We opt to use a galaxy template with solar metal- 
licity and an exponentially declining SFR with an e-folding 
time of 500 Myr, generat ed with the stellar population syn - 
thesis code PEGASE.2 l|Fioc fc Rocca-Volmerangel Il997l ). 
Output from non-stellar emission or evolving main-sequence 
stars is not included, as the source of non-stellar emission at 
7 urn is assumed to be the same as that of the FIR emission. 
Assuming a formation redshift of z = 9, the galaxy ages 
range from 1.5 to 3Gyr and the predicted 24 urn flux den- 
sities due to stellar emission range from 1.3 to 8.8 uJy, de- 
pending primarily on the galaxy's redshift. For each stacked 
sample, we find the predicted contamination per galaxy from 
stellar emission is 3.0, 2.9, and 3.9 uJy for the entire sam- 
ple, the disk-like and spheroid-like populations, respectively. 
Contributing at most 20% (see Table [3]) to the observed 
24 \im flux densities of the spheroid population and less than 
5% to the observed 24 u.m flux densities of the disk-like and 
total sample, we conclude that the mid-infrared observations 
(rcstframe 7-8 u.m) included in our analysis are dominated 
by non-stellar emission (i.e., dust and PAH emission). 

4.3 Best-fit SEDs and Star-Formation Rates 

The best-fit SED and interquartile range to the stacked val- 
ues of the complete catalog are shown in the left panel of 
Figure [3] corresponding to a median (plus/minus Gaussian) 
[interquartile] temperature of T = 29.4± f 1 ) ^ [27.3, 31.6] K, 
luminosity of Lfir = 6.2+^ [4.7,8.0] x 10 11 L , and 
SFR = 63±" [48, 81] M yr" 1 '. 

As a sanity check, we compare our mo dified blackbody 
appro ximat ion to the best-fit template of ICharv fc Elbaj 
(2001, hereafter CE01). The purpose of this is simply to re- 
assure ourselves that an exponential approximation on the 
Wien side of the thermal SED is not an unreasonable way 
to estimate the contribution to the bolometric luminosity 
short of the SED peak, rather than an attempt to derive 
SFRs from fitting SED templates. Thus, for each of the 101 
templates, we approximate the stacked SED by taking the 
average of templates shifted to the redshift of each galaxy 
in the catalog; this acts to smear out the otherwise highly- 
variable PAH region of the rest-frame SED probed by the 
24 u.m band. We fit the resulting template to our photomet- 
ric points without accounting for calibration uncertainties, 
color corrections, or correlations among bands. The best-fit 
template is shown as a 3-dot-dashed line in Figure O and 
falls well inside our error region. However, the SFR of the 
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All Disk-like Spheroid-like 

i 1 1 — 1 1 1 1 — ^-r^ ' — 1 1 1 1 — 1 1 1 — 




Wovelength [jlim] 

Figure 3. SED fits to the stacked flux densities of all (left), disk-like (center), and spheroid-like (right) sources. The median value of 
the redshift distribution, z ~ 2.3, is used here to convert flux densities into rest-frame luminosity. The brown crosses are from Spitzer 
(24 (am); the blue dots are from PACS (70, 100, and 160 nm); the green squares are from BLAST (250, 350, and 500 um); and the red 
asterisks are from LABOCA (870 um). The error bars represent the 1-cr Gaussian uncertainties from the stacks as listed in Table [3] 
The SED is modeled as a modified black body with a fixed emissivity index /3 = 1.5, and a power-law approximation on the Wien side 
with slope a = 2. The solid black lines are the best-fit SEDs, while the dotted light-blue lines enclosing the shaded regions show the 
uncertainties due to the width of the redshift distribution (interquartile range), which clearly dominat e over the Gauss i an err ors on the 
stacks (see Section 14.11 1. The navy 3-dot-dashed lines are the best-fit, redshift-averaged templates from ICharv fc Elbaj i200ll) . 



best-fit template is SFR = 87M Q yr _1 , or ~ 38% larger 
than our modified blackbody estimate. This overestimate 
likely arises because the fit with the CE01 template does 
not include the substantial correlations among bands (see 
Section [3. 5) , which reduce the significance of the combina- 
tion of individual photometric points. 

We then separately fit the stacked flux densities mea- 
sured for disk-like and spheroid-like galaxies. The best- 
fit modified blackbody SED for the disk-like popula- 
tion is shown in the center panel of Figure O and re- 
sults in a median (plus/minus Gaussian) [interquartile] 
temperature of T = 32.6lJ^ [30.8, 34.6] K, luminosity 
of Lpir = 12.0±i;s [9.8,14.8] x lO n L , and SFR = 
122±J| [100, 150]MQyr _1 . The best-fit CE01 template is 
also shown, and corresponds to a SFR = 142 Mq yr _1 . 

Likewise, the best-fit modified blackbody SED for the 
spheroid-like population is shown in the right panel of Fig- 
ure [3] and results in a median (plus/minus Gaussian) [in- 
terquartile] temperature of T = 27.6±° 7 3 6 [24.2, 30.8] K, lu- 
minosity of Lpir = 1.4± ;g [0.9, 2.0] x 10 11 L , and SFR = 
14^g [9, 20] Mq yr _1 . Note that the lower Gaussian errors 
exceed the lower bound of the interquartile range, thus re- 
flecting the elevated level of uncertainty in our measurement. 
Once again, the best-fit CE01 template is shown, which cor- 
responds to a SFR = 16MQyr _1 . 

Finally, to check that contributions to the rest-frame 
SED from PAHs, which are highly variable, are not signif- 
icantly influencing the best-fit result, we re-fit the modi- 
fied black body after excluding: i) just the 24 (im point; 
and ii) all points below rest-frame 100 fim. In the first 
scenario we find SFR = 57±? 4 [42, 72], 1091^ [91,135], 
and 12^ 7 [8, 17] Mq yr" 1 , while in the second scenario 
we find SFR = 67±H [50, 83], 129±|| [107,160], and 
30±7° [19,42]M yr"\ for all, disk-like, and spheroid-like 
galaxies, respectively. In the first case, the SFRs decrease 



only marginally, and within the error bars, suggesting that 
the 24 /im point alone does not unreasonably influence the 
result. In the second case, SFRs for all and disk-like galaxies 
are mildly affected, while the spheroid-like galaxies are ar- 
tificially boosted by a factor of two simply because we have 
removed the two data points consistent with zero. 



Thus, although the best-fit SED to the combined stack 
returns a robust, 4-cr detection, it is clear that signal is dom- 
inated by the disk-like, n ^ 2 galaxies, which are detected at 
5-cr. The best-fit to the spheroid-like, n > 2 galaxies, on the 
other hand, returns a marginal 2-a result, which suggests, 
but does not formally detect, a low level of star formation 
taking place in the spheroid-like population. 



We note that if correlations between bands are not prop- 
erly accounted for when finding the best-fit SED, the cor- 
responding SFRs are 94, 163, and 32MQyr _1 for all, disk- 
like, and sphere-like galaxies, respectively. This is signifi- 
cantly different, and the reason is that if correlations be- 
tween bands are not considered, more weight is attributed 
to the BLAST measurements than is appropriate, pulling 
the best-fit up. Intuitively this makes sense: since confusion 
noise arises from multiple sources in a beam, a larger beam 
has more sources in it and thus more variance, i.e., more 
confusion noise. Of course, for bands of similar wavelengths 
those sources are more or less present in each map, resulting 
in confusion noise that is not independent. Though this will 
be less of a problem for //ersc/ieZ/SPIRE, the beam size and 
thus improvement in confusion noise is only of order ~ 2x , 
so that correctly accounting for correlated confusion noise 
will still be very important. 



8 View, Moncelsi, et al. 



5 DISCUSSION 

5.1 Consequences for Galaxy Growth 

There are indications that massive galaxies at high red- 
shift are the cores of present-day ma ssive ellipticals 
ijHopkins et al.l 120091 ; iBezanson et all 120091 ). and that the 
growth of these galaxies takes place m ostly in the outskirts 
via star formation and m inor mergers ijHopkins et al.ll2009l ; 
van Dokkum etaH l20ld ) - a process sometimes referred 
to as "inside-out" growth, which has also be en observed 
in hydrodynamical co s molog i cal simulat i ons ( Naab et al.l 
l2009t ; iJohansson et all l2009j; lOser et all 12010 ). Further- 
more, Ivan Dokkum etafl I^OIO ) find that a SFR of 
55±13 Mq yr _1 at z ~ 2 is necessary to account for the 
mass growth they observe in massive galaxies selected by 
number density, from z — 2 to the present day, and that 
for z > 1.5 the mechanism for growth is primarily star for- 
mation. Note that nearly half of their z ~ 2 subsample of 
massive galaxies has n < 2 (see right panel of their Fig- 
ure 7) — a fraction similar to our own. Our measurement 
of 63 [48, 81] Mq yr _1 for the entire sample agrees well with 
their finding; however, we do not find convincing evidence 
that star formation is the mechanism driving the expansion 
in spheroid-like galaxies. 

5.2 Potential contribution from other sources of 
dust heating 

Star formation may not be the only explanation for infrared 
emission in our sample which consists of very massive, yet 
relatively young systems. The age of the universe by z = 3- 
1.8 is just ~ 1.5-3 Gyr, providing a strict upper limit on the 
ages of the stellar populations. If these galaxies formed the 
bulk of their stellar mass, as their colors suggest, early on, 
then it is likely that they contain a large population of stars 
undergoing post-main-sequence phases in which carbona- 
ceous dusty material is being produced and heated by very 
luminous stars. While it is generally accepted in t he current 
versions of stellar population synthesis m odels l|Marastonl 
l2005l ; lBruzua]|2010l ; IConrov fc Gunnll2O10D that thermally- 
pulsating asymptotic giant branch (TP-AGB) stars can con- 
tribute up to 70% of the emission seen in the near-infrared 
bands at ages of 1-2 Gyr, there has been little work calibrat- 
ing the global contribution of this population to a galaxy's 
infrared luminosity. By extension, given the masses and ages 
of our galaxies, we cannot rule out the possibility that the in- 
frared emission we have detected in our analysis is partially 
due to dust heated and created by post-main-sequence stars. 

5.3 Red and Dead? 

Our best-fit SED to stacked data does not correspond to 
a formal detection of star formation in the spheroid-like 
(71 > 2) galaxies, however, the high 24 u.m flux might indi- 
cate a non-zero star-formation rate. Though we have stated 
that 24 u.m emission alone is insufficient for accurately esti- 
mating the level of star formation in a galaxy, locally, 24 U-m 
emissi on is typically well correlated with star- forming re- 
gions dCalzetti et al.ll2007l ; lKennicutt et~ai1l2009h . Addition- 
ally, emission from evolved stars seems unable to account 
for the level of 24 u.m emission observed (§ I4.2[) . Therefore, 



it seems plausible that star formation may be occurring in 
these galaxies at some level. Furthermore, if a low level of 
star formation does indeed exist, given the noise properties 
of our maps, the only bands which would permit a significant 
detection are the 24 and 870 u.m bands — those in which our 
measurements have signal-to-noise greater than 2.5. 

If star formation is occurring in the spheroid-like galax- 
ies, even at a low level, and if they are fair analogs of the 
apparently r ed-an d-dead compact spheroids seen by e.g., 
iKriek et al.1 l|2009h . then why is it that star formation is 
not significant in ultra-deep spectroscopy? One possibility is 
that the star formation is l ocalized in very du st-obscured re- 
gions. Note that although IKriek et al.1 l|2009h detect a faint 
Ha line, concluding that SFRs are at most 2-4MQyr _1 , 
that is after correcting for a very moderate amount of ex- 
tinction (A„ = 0-0.3 mag). For this galaxy to actually be 
forming around 14MQyr _1 , Lu a would need to have been 
underestimated by a factor of ~ 3.5-7, which corresponds 
to 1.4-2.1 mag of extinction. Considering that resolved ob- 
servations of nearby galaxies showing extinction values o f 
Ahc« > 3 are common in Hn regions jPrcscott ct al. 2007) 
and regions of high star formation (jMentuch et al.ll2010h . 
this amount of extinction is not unrealistic. 

Lastly, note that our low lev els of obser v ed st ar for- 
mation are in disagreement with ICava et al.l (|2010l ). who 
(after correcting by 0.23 dex due to differences in the as- 
sumed IMF) find an average SFR of 30-60 Mq yr _1 for the 
spheroid-like galaxies. Their average SFRs are based on pho- 
tometry of individual galaxies at 24 urn, and at 250, 350, 
and 500 u.m from //er,sc/ie//SPIRE with a mean detection 
fraction for the spheroid-like population of ~ 0.4 at 24 u.m 
and ~ 0.15 at 250 uxn. This selection makes it difficult to 
properly compare measurements. 



6 SUMMARY 

Our goal was to search for evidence of star formation in 
high-redshift massive galaxies, with the hope of leading to 
a better understanding of the mechanisms responsible for 
their growth. We found that on average the full catalog 
of sources are forming stars with a median [interquartile] 
SFR = 63 [48, 81] Mq yr" 1 , which can be decomposed into 
a relatively strong signal for the disk-like galaxies, with a 
median [interquartile] SFR = 122 [100, 150] Mq yr _1 , and 
a marginal signal for the spheroid-like population, with a 
median [interquartile] SFR = 14 [9, 20] M yr~\ 

The level of star-formation detected for the full cata- 
log is in goo d agreement with other m easurements of galaxy 
growth (e.g.. Ivan Dokkum etHlbOld ) which show that star 
formation can account for most of the growth at these red- 
shifts. However, despite having detected stacked emission at 
24 and 870 urn, we are unable to say convincingly that star 
formation is responsible for the dramatic size evolution of 
the spheroid-like population. 

Lastly, tho ugh a red sequenc e appears to already be in 
place by z ~ 2 (|Kriek et alj |2009). we found hints that per- 
haps the red, compact, spheroid- like galaxies may not be 
completely dead. Future stacking work with larger catalogs 
and better maps will go a long way to further understanding 
this question. Better d ata bracketing the p eak with SPIRE 
(250, 350, and 500 um; iGriffin et alTl201of) will make more 
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robust estimates of the SED possible, and will greatly in- 
crease our understanding of star formation in high-redshift 
massive galaxies. 
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